{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bayesian regression analysis in lmfit\n",
    "In this example we will perform a Bayesian regression analysis in the `lmfit` package. You'll need the `emcee` and `corner` packages installed. Both of these are available on PyPI. We start off with the requisite imports."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import lmfit\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now lets create the data we'll then fit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.linspace(1, 10, 250)\n",
    "np.random.seed(0)\n",
    "y = 3.0*np.exp(-x/2) - 5.0*np.exp(-(x-0.1) / 10.) + 0.1*np.random.randn(len(x))\n",
    "plt.plot(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll need a `Parameters` object to store the Parameters of our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p = lmfit.Parameters()\n",
    "p.add_many(('a1', 4), ('a2', 4), ('t1', 3), ('t2', 3., True))\n",
    "\n",
    "def residual(p):\n",
    "    v = p.valuesdict()\n",
    "    return v['a1']*np.exp(-x/v['t1']) + v['a2']*np.exp(-(x-0.1) / v['t2']) - y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll do a straightforward least-squares fit first, using the *Nelder-Mead* method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mi = lmfit.minimize(residual, p, method='Nelder', nan_policy='omit')\n",
    "lmfit.printfuncs.report_fit(mi.params, min_correl=0.5)\n",
    "plt.plot(x, y)\n",
    "plt.plot(x, residual(mi.params) + y, 'r')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above analysis ignored the uncertainty on each datapoint. So we'll add in a noise Parameter to address that. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# add a noise parameter\n",
    "mi.params.add('noise', value=1, min=0.001, max=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the Bayesian analysis we have to calculate the log-posterior probability for the system. The log-posterior probability is the sum of the log-prior and log-likelihood probabilities. \n",
    "\n",
    "The log-prior encodes prior beliefs about the system. With the `lmfit.emcee` method the log-prior probability is assumed to be 0 if a `Parameter` is within its bounds. If it's outside the bounds the log-prior probability is `-np.inf`, i.e. impossible. This is known as a uniform prior.\n",
    "\n",
    "The log-likelihood function is the probability of the data given the model. The following function calculates the log-likelihood. Note how the data uncertainties (http://dan.iel.fm/emcee/current/user/line/) are included in the log-likelihood.\n",
    "\n",
    "Note, if you want a non-uniform prior, then you should include the log-prior and log-likelihood in a single function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is the log-likelihood probability for the sampling. We're going to estimate the\n",
    "# size of the uncertainties on the data as well.\n",
    "def lnprob(p):\n",
    "    noise = p['noise']\n",
    "    return -0.5 * np.sum((residual(p) / noise)**2 + np.log(2 * np.pi * noise**2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With `lmfit.emcee` we first have to create a `lmfit.Minimizer` object. We're going to initialise it with the best fit from above.\n",
    "\n",
    "Then we do the sampling. We are going to burn in (discard) the first 300 steps out of a total of 1000. We are then going to thin, which reduces the amount of autocorrelation in the chain. Thinning by 20 means that 1 in 20 samples is kept."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mini = lmfit.Minimizer(lnprob, mi.params)\n",
    "res = mini.emcee(burn=300, steps=1000, thin=20, params=mi.params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now lets plot the outcome with the `corner` package. Note the banana shapes, they arise out of non-normality for the parameter probability distribution. The elliptical nature of several parameter pairs indicates a high degree of correlation. The correlation parameter is returned as an attribute of the `res` OptimizeResult."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import corner\n",
    "a=corner.corner(res.flatchain, labels=res.var_names, truths=list(res.params.valuesdict().values()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are the output parameters, the correlations are also worked out. The `OptimizeResult` contains `chain` and `lnprob` attributes which contain the samples and corresponding log-probability. In addition, the Minimizer will then also contain a `sampler` attribute (a `emcee.EnsembleSampler` object) which can be used for other purposes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"median of posterior probability distribution\")\n",
    "print('--------------------------------------------')\n",
    "lmfit.report_fit(res.params)\n",
    "# find the maximum likelihood solution\n",
    "highest_prob = np.argmax(res.lnprob)\n",
    "hp_loc = np.unravel_index(highest_prob, res.lnprob.shape)\n",
    "mle_soln = res.chain[hp_loc]\n",
    "for i, par in enumerate(p):\n",
    "    p[par].value = mle_soln[i]\n",
    "print(\"\\nMaximum likelihood Estimation\")\n",
    "print('-----------------------------')\n",
    "print(p)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(x, y)\n",
    "plt.plot(x, residual(mi.params) + y, 'r', label='Nelder-Mead')\n",
    "plt.plot(x, residual(res.params) + y, 'black', label='emcee')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's calculate some uncertainty intervals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "quantiles = np.percentile(res.flatchain['t1'], [2.28, 15.9, 50, 84.2, 97.7])\n",
    "print(\"1 sigma spread\", 0.5 * (quantiles[3] - quantiles[1]))\n",
    "print(\"2 sigma spread\", 0.5 * (quantiles[4] - quantiles[0]))"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
